The vision community has explored numerous pose guided human editing methods due to their extensive practical applications. Most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. However, the problem is ill-defined in cases when the target pose is significantly different from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse the knowledge from multiple viewpoints, we design a selector network that takes the pose keypoints and texture from images and generates an interpretable per-pixel selection map. After that, the encodings from a separate network (trained on a single image human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on 2 newly proposed tasks - Multi-view human reposing, and Mix-and-match human image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a much better alternative.
translated by 谷歌翻译
When answering natural language questions over knowledge bases (KBs), incompleteness in the KB can naturally lead to many questions being unanswerable. While answerability has been explored in other QA settings, it has not been studied for QA over knowledge bases (KBQA). We first identify various forms of KB incompleteness that can result in a question being unanswerable. We then propose GrailQAbility, a new benchmark dataset, which systematically modifies GrailQA (a popular KBQA dataset) to represent all these incompleteness issues. Testing two state-of-the-art KBQA models (trained on original GrailQA as well as our GrailQAbility), we find that both models struggle to detect unanswerable questions, or sometimes detect them for the wrong reasons. Consequently, both models suffer significant loss in performance, underscoring the need for further research in making KBQA systems robust to unanswerability.
translated by 谷歌翻译
Bike sharing systems often suffer from poor capacity management as a result of variable demand. These bike sharing systems would benefit from models to predict demand in order to moderate the number of bikes stored at each station. In this paper, we attempt to apply a graph neural network model to predict bike demand in the New York City, Citi Bike dataset.
translated by 谷歌翻译
Accurate segmentation of live cell images has broad applications in clinical and research contexts. Deep learning methods have been able to perform cell segmentations with high accuracy; however developing machine learning models to do this requires access to high fidelity images of live cells. This is often not available due to resource constraints like limited accessibility to high performance microscopes or due to the nature of the studied organisms. Segmentation on low resolution images of live cells is a difficult task. This paper proposes a method to perform live cell segmentation with low resolution images by performing super-resolution as a pre-processing step in the segmentation pipeline.
translated by 谷歌翻译
医学成像是现代医学治疗和诊断的基石。但是,对于特定静脉局体任务的成像方式的选择通常涉及使用特定模式的可行性(例如,短期等待时间,低成本,快速获取,辐射/侵入性降低)与临床上的预期性能之间的权衡。任务(例如,诊断准确性,治疗计划的功效和指导)。在这项工作中,我们旨在运用从较不可行但表现更好(优越)模式中学到的知识,以指导利用更可行但表现不佳(劣等)模式,并将其转向提高性能。我们专注于深度学习用于基于图像的诊断。我们开发了一个轻量级的指导模型,该模型在训练仅消耗劣质模式的模型时利用从优越方式中学到的潜在表示。我们在两种临床应用中检查了我们方法的优势:从临床和皮肤镜图像中的多任务皮肤病变分类以及来自多序列磁共振成像(MRI)和组织病理学图像的脑肿瘤分类。对于这两种情况,我们在不需要出色的模态的情况下显示出劣质模式的诊断性能。此外,在脑肿瘤分类的情况下,我们的方法的表现优于在上级模态上训练的模型,同时产生与推理过程中使用两种模态的模型相当的结果。
translated by 谷歌翻译
我们开发了一种自主导航算法,用于在二维环境中运行的机器人杂乱,其具有任意凸形的障碍物。所提出的导航方法依赖于混合反馈,以保证机器人对预定目标位置的全局渐近稳定,同时确保无障碍工作空间的前向不变性。主要思想在于基于机器人相对于最近障碍的接近设计,在移动到目标模式和障碍物避免模式之间设计适当的切换策略。当机器人初始化远离障碍物的边界时,所提出的混合控制器产生连续速度输入轨迹。最后,我们为所提出的混合控制器的基于传感器的实现提供了一种算法过程,并通过一些仿真结果验证其有效性。
translated by 谷歌翻译
深入学习模型的压缩在将这些模型部署到边缘设备方面具有根本重要性。在压缩期间,在压缩期间结合硬件模型和应用限制可以最大限度地提高优势,但使其专为一种情况而设计。因此,压缩需要自动化。搜索最佳压缩方法参数被认为是一个优化问题。本文介绍了一种多目标硬件感知量化(MohaQ)方法,其将硬件效率和推理误差视为混合精度量化的目标。该方法通过依赖于两个步骤,在很大的搜索空间中评估候选解决方案。首先,应用训练后量化以进行快速解决方案评估。其次,我们提出了一个名为“基于信标的搜索”的搜索技术,仅在搜索空间中重新选出所选解决方案,并将其用作信标以了解刷新对其他解决方案的影响。为了评估优化潜力,我们使用Timit DataSet选择语音识别模型。该模型基于简单的复发单元(SRU),由于其相当大的加速在其他复发单元上。我们应用了我们在两个平台上运行的方法:SILAGO和BETFUSION。实验评估表明,SRU通过训练后量化可以压缩高达8倍,而误差的任何显着增加,误差只有1.5个百分点增加。在Silago上,唯一的搜索发现解决方案分别实现了最大可能加速和节能的80 \%和64 \%,错误的误差增加了0.5个百分点。在BETFUSION上,对于小SRAM尺寸的约束,基于信标的搜索将推断搜索的错误增益减少4个百分点,并且与BitFusion基线相比,可能的达到的加速度增加到47倍。
translated by 谷歌翻译